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Abstract 

The optimal selection of experimental conditions is essential to maximizing the 
value of data for inference and prediction, particularly in situations where experiments 
are time-consuming and expensive to conduct. We propose a general mathematical 
framework and an algorithmic approach for optimal experimental design with non- 
linear simulation-based models; in particular, we focus on finding sets of experiments 
that provide the most information about targeted sets of parameters. 

Our framework employs a Bayesian statistical setting, which provides a founda- 
tion for inference from noisy, indirect, and incomplete data, and a natural mechanism 
for incorporating heterogeneous sources of information. An objective function is con- 
structed from information theoretic measures, reflecting expected information gain 
from proposed combinations of experiments. Polynomial chaos approximations and a 
two-stage Monte Carlo sampling method are used to evaluate the expected informa- 
tion gain. Stochastic approximation algorithms are then used to make optimization 
feasible in computationally intensive and high- dimensional settings. These algorithms 
are demonstrated on model problems and on nonlinear parameter estimation prob- 
lems arising in detailed combustion kinetics. 
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1. Introduction 

Experimental data play an essential role in developing and refining models of 

physical systems. For example, data may be used to update knowledge of parameters 
in a model or to discriminate among competing models. Whether obtained through 
field observations or laboratory experiments, however, data may be difficult and ex- 
pensive to acquire. Even controlled experiments can be time-consuming or delicate to 
perform. In this context, maximizing the value of experimental data — designing ex- 
periments to be "optimal" by some appropriate measure — can dramatically accelerate 
the modeling process. Experimental design thus encompasses questions of where and 
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when to measure, which variables to interrogate, and what experimental conditions 
to employ. 

These questions have received much attention in the statistics community and 
in many science and engineering applications. When observables depend linearly on 
parameters of interest, common solution criteria for the optimal experimental de- 
sign problem are written as functionals of the information matrix [1]. These criteria 
include the well-known 'alphabetic optimality' conditions, e.g., A-optimality to min- 
imize the average variance of parameter estimates, or G-optimality to minimize the 
maximum variance of model predictions. Bayesian analogues of alphabetic optimal- 
ity, reflecting prior and posterior uncertainty in the model parameters, can be derived 
from a decision-theoretic point of view [2]. For instance, Bayesian D-optimality can 
be obtained from a utility function containing Shannon information while Bayesian 
A-optimality may be derived from a squared error loss. In the case of Gaussian-linear 
models, the criteria of Bayesian alphabetic optimality reduce to mathematical forms 
that parallel their non-Bayesian counterparts [2]. 

For nonlinear models, however, exact evaluation of optimal design criteria is much 
more challenging. More tractable design criteria can be obtained by imposing addi- 
tional assumptions, effectively changing the form of the objective; these assumptions 
include linearizations of the forward model, Gaussian approximations of the pos- 
terior distribution, and additional assumptions on the marginal distribution of the 
data [2]. In the Bayesian setting, such assumptions lead to design criteria that may 
be understood as approximations of an expected utility. Most of these involve prior 
expectations of the Fisher information matrix [3]. Cruder "locally optimal" approxi- 
mations require selecting a "best guess" value of the unknown model parameters and 
maximizing some functional of the Fisher information evaluated at this point [4]. 
None of these approximations, though, is suitable when the parameter distribution 
is broad or when it departs significantly from normality [5]. A more general design 
framework, free of these limiting assumptions, is preferred [6, 7]. 

More rigorous information theoretic criteria have been proposed throughout the 
literature. The seminal paper of Lindley [8] suggests using expected gain in Shannon 
information, from prior to posterior, as a measure of the information provided by an 
experiment; the same objective can be justified from a decision theoretic perspec- 
tive [9, 10]. Sebastiani and Wynn [11] propose selecting experiments for which the 
marginal distribution of the data has maximum Shannon entropy; this may be under- 
stood as a special case of Lindley's criterion. Maximum entropy sampling (MES) has 
seen use in applications ranging from astronomy [12] to geophysics [13], and is well 
suited to nonlinear models. Reverting to Lindley's criterion, Ryan [14] introduces a 
Monte Carlo estimator of expected information gain to design experiments for a model 
of material fatigue. Terejanu et al. [15] use a kernel estimator of mutual information 
(equivalent to expected information gain) to identify parameters in chemical kinetic 
model. The latter two studies evaluate their criteria on every element of a finite set 
of possible designs (comprising ten designs in most examples), and thus sidestep the 
challenge of optimizing the design criterion over general design spaces. And both 
report significant limitations due to computation expense; [14] concludes that "full 
blown search" over the design space is infeasible, and that two order-of-magnitude 
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gains in computational efficiency would be required even to discriminate among the 
enumerated designs. 

The application of optimization methods to experimental design has thus favored 
simpler design objectives. The chemical engineering community, for example, has 
tended to use linearized and locally optimal [16] design criteria or other objectives [17] 
for which deterministic optimization strategies are suitable. But in the broader con- 
text of decision theoretic design formulations, sampling is required. [18] proposes a 
curve fitting scheme wherein the expected utility was fit with a regression model, using 
Monte Carlo samples over the design space. This scheme relies on problem-specific in- 
tuition about the character of the expected utility surface. Clyde et al. [19] explore the 
joint design, parameter, and data space with a Markov chain Monte Carlo (MCMC) 
sampler; this strategy combines integration with optimization, such that the marginal 
distribution of sampled designs is proportional to the expected utility. This idea is 
extended with simulated annealing in [20] to achieve more efficient maximization of 
the expected utility. [19, 20, 18] use expected utilities as design criteria but do not 
pursue information theoretic design metrics. Indeed, direct optimization of informa- 
tion theoretic metrics has seen much less development. Building on the enumeration 
approaches of [13, 14, 15] and the one-dimensional design space considered in [12], [7] 
iteratively finds MES designs in multi-dimensional spaces by greedily choosing one 
component of the design vector at a time. Hamada et al. [21] also find "near-optimal" 
designs for linear and nonlinear regression problems by maximizing expected informa- 
tion gain via genetic algorithms. But the coupling of rigorous information theoretic 
design criteria, complex physics-based models, and efficient optimization strategies 
remains an open challenge. 

This paper addresses exactly these issues. Our interest is in physically realistic 
and hence computationally intensive models. We advance the state of the art by 
introducing flexible approximation and optimization strategies that yield optimal ex- 
perimental designs for nonlinear systems, using a full information theoretic formalism, 
efficiently and with few limiting assumptions. 

In particular, we employ a Bayesian statistical approach and focus on the case 
of parameter inference. Expected Shannon information gain is taken as our design 
criterion; this objective naturally incorporates prior information about the model 
parameters and accommodates very general probabilistic relationships among the 
experimental observables, model parameters, and design conditions. The need for 
such generality is illustrated in the numerical examples (Sections 5 and 6). To make 
evaluations of expected information gain computationally tractable, we introduce a 
generalized polynomial chaos surrogate [22, 23] that captures smooth dependence of 
the observables jointly on parameters and design conditions. The surrogate carries no 
a priori restrictions on the degree of nonlinearity and uses dimension-adaptive sparse 
quadrature [24] to identify and exploit anisotropic parameter and design dependencies 
for efficiency in high dimensions. We link the surrogate with stochastic approximation 
algorithms and use the resulting scheme to maximize the design objective. This 
formulation allows us to plan single experiments without discretizing the design space, 
and to rigorously identify optimal "batch" designs of multiple experiments over the 
product space of design conditions. 
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Figure 1 shows the key components of our approach, embedded in a flowchart 
describing a design-experimentation-model improvement cycle. The upper boxes fo- 
cus on experimental design: the design criterion is formulated in Sections 2.1 and 
2.2; estimation of the objective function is described in Section 2.3; and stochastic 
optimization approaches are described in Section 2.4. The construction of polyno- 
mial chaos surrogates for computationally intensive models is presented in Section 3. 
Section 4 briefly reviews computational approaches for Bayesian parameter inference, 
which come into play after the selected experiments have been performed and data 
have been collected. All of these tools are demonstrated on two example problems: a 
simple nonlinear model in Section 5 and a shock tube autoignition experiment with 
detailed chemical kinetics in Section 6. 

2. Experimental Design Formulation 

2.1. Experimental goals 

Optimal experimental design relies on the construction of a design criterion, or 
objective function, that reflects how valuable or relevant an experiment is expected 
to be. A fundamental consideration in specifying this objective is the application of 
interest — i.e., what does the user intend to do with the results of the experiments? 
For example, if one would like to estimate a particular physical constant, then a good 
objective function might reflect the uncertainty in the inferred values, or the error in 
a point estimate. On the other hand, if one's ultimate goal is to make accurate model 
predictions, then a more appropriate objective function should directly consider the 
distribution of the model outputs conditioned on data. If one would like to find the 
"best" model among a set of candidate models, then the objective function should 
reflect how well the data are expected to support each model, favoring experiments 
that maximize the ability of the data to discriminate. These considerations motivate 
the intuitive notion that an objective function should be based on specific experimental 
goals. 

In this paper, we shall assume that the experimental goal is to infer a finite number 
of model parameters of interest. Parameter inference is of course an integral part 
of calibrating models from experimental data [25]. The expected utility framework 
developed below can be generalized to other experimental goals, and we will mention 
this where appropriate. Note that one could also augment the objective function by 
adding a penalty that reflects experimental effort or cost. More broadly, one can 
always add resource constraints to the experimental design optimization problem. 
In the interest of simplicity, and since costs and constraints are inevitably problem- 
specific, we do not pursue such additions here. 

2.2. Design criterion and expected utility 

We will formulate our experimental design criterion in a Bayesian setting. Bayesian 
statistics offers a foundation for inference from noisy, indirect, and incomplete data; 
a mechanism for incorporating physical constraints and heterogeneous sources of in- 
formation; and a complete assessment of uncertainty in parameters, models, and pre- 
dictions. The Bayesian approach also provides natural links to decision theory [26] , a 
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framework we will exploit below. (For discussions contrasting Bayesian and frequen- 
tist approaches to statistics, see [27] and [28].) 

The Bayesian paradigm [29] treats unknown parameters as random variables. Let 
(fi, J-", P) be a probability space, where f2 is a sample space, J-" is a a-field, and P 
is a probability measure on J-"). Let the vector of real- valued random variables 
6 : Q ^ M"® denote the uncertain parameters of interest, i.e., the parameters to be 
conditioned on experimental data. 6 is associated with a measure fi on M"® , such that 
f^{A) = F{e-^ (A)) for A G M"". We then define p{e) = dfi/dO to be the density of 
with respect to Lebesgue measure. For the present purposes, we will assume that 
such a density always exists. Similarly we treat the data y as an ]R"''-valued random 
variable endowed with an appropriate density, d G M"""* denotes the design variables 
or experimental conditions. Hence is the number of uncertain parameters, Uy is the 
number of observations, and rid is the number of design variables. If one performs an 
experiment at conditions d and observes a realization of the data y, then the change 
in one's state of knowledge about the model parameters is given by Bayes' rule: 

p{y\e,d)pie\d) 

Here p{B\d) is the prior density, p{y\6,d) is the likelihood function, p(0|y,d) is 
the posterior density, and p(y|d) is the evidence. It is reasonable to assume that 
prior knowledge on does not vary with the experimental design, leading to the 
simplification p{0\d) = p{0). 

Taking a decision theoretic approach, Lindley [9] suggests that an objective for 
experimental design should have the following general form: 

U{d) = [ [ u{d,y,0)p{0,y\d)d0dy 
Jy J& 

= [ [ u{d,y,0)p{0\y,d)p{y\d)d0dy, (2) 
Jy Jo 

where u{d,y,0) is a utility function, U{d) is the expected utility, is the support 
of p{0), and 3^ is the support of p{y\d). The utility function u should be chosen to 
reflect the usefulness of an experiment at conditions d, given a particular value of the 
parameters and a particular outcome y. Since we do not know the precise value of 
and we cannot know the outcome of the experiment before it is performed, we take 
the expectation of u over the joint distribution of and y. 

Our choice of utility function is rooted in information theory. In particular, fol- 
lowing [8], we put u{d,y,0) equal to the relative entropy, or KuUback-Leibler (KL) 
divergence, from the posterior to the prior. For generic distributions A and B, the 
KL divergence from A to 5 is 



DKdA\\B)= / pA{0)ln 



& 



Pa{0) 



d0 = Ea 



PBie) 



(3) 



where pa and ps are probability densities, is the support of pb{0), and OlnO = 0. 
This quantity is non-negative, non-symmetric, and reflects the difference in informa- 
tion carried by the two distributions (in units of nats) [30, 31]. Specializing to the 
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inferential problem at hand, the KL divergence from the posterior to the prior is 

'p{o\y,d) 



u{d,y,e) ^ DKLiPe{-\y,d)\\pe{-)) = / p(0|y,d)ln 

'& 



p{6) 



de = uid,y). (4) 



Note that this choice of utility function involves an "internal" integration over the 
parameter space {6 is a dummy variable representing the parameters), therefore it is 
not a function of the parameters 0. Thus we have 



[/(d) = / / u{d,y)p{e\y,d)dep{y\d)dy 



u{d,y)p{y\d)dy 



y 




p{e\y,d) In 



y J& 



p{o\y,d) 
Pie) 



dep{y\d)dy. 



(5) 



To simplify notation, 6 in Equation 5 is replaced by 0, yielding 

\p io\y,d 

'yj& L 



[/(d) = / / Pie\y,d)\n 
Jy J@ 



dep{y\d)dy 



Ey|d [Dkl ipie\y,d)\\p{0))]. 



(6) 



The expected utility U is therefore the expected information gain in 6. The intuition 
behind this expression is that a large KL divergence from posterior to prior implies 
that the data y decrease entropy in by a large amount, and hence those data are 
more informative for parameter inference. As we have only a distribution for the 
data y|d that may be observed, we are interested in maximizing information gain on 
average. We also note that U is equivalent to the mutual information between the 
parameters and the data y.^ When applied to a linear-Gaussian design problem, 
U reduces to the Bayesian D-optimality condition.^ 

Finally, the expected utility must be maximized over the design space V to find 
the optimal experimental design 



argmax [/(d). 



(7) 



^Using the definition of conditional probability, we have 

p(g|y,d) 
p(0,y\d) 



[/(d) = / / P{0\y,d)ln 
Jy J& 

^11 Me,y|d)ln 
Jy J& 



ly- 
/(e;y|d), 



p{e)p{y\d) 



dep{y\A)dY 
de dy 



which is the mutual information between parameters and data, given the design. 

^D-optimality maximizes the determinant of the information matrix in a linear design problem [1] . 
Bayesian D-optimality, in a linear-Gaussian problem, maximizes the determinant of the sum of the 
information matrix and the prior covariance [2] . 
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What if resources allow multiple (say > 1) experiments to be carried out, 
but time and setup constraints require them to be planned (and perhaps performed) 
simultaneously? If d* is the optimal design parameter for a single experiment, the 
best choice is not necessarily to repeat the experiment at d* N times; this does 
not generally yield the optimal expected information gain from all the experiments. 
(Appendix A shows that the expected utility of two experiments is not, in general, 
equal to the sum of the expected utilities of the individual experiments. Section 5 
provides a numerical example of this situation.) Instead, all of the experiments should 
be incorporated into the likelihood function, where now d G M^"'*, y G M^"", and the 
data from the different experiments are conditionally independent given and the 
augmented d. The new optimal design d* G M^"'^ then carries the A^ sets of conditions 
for all the experiments, maximizing the expected total information gain when these 
experiments are simultaneously performed. It is interesting to note that a simpler 
objective function often used in experimental design — the predictive variance of the 
data y — would always suggest repeating all A^ experiments at the single-experiment 
design optimum. 

If the A^ experiments need not be carried out simultaneously, then sequential 
experimental design may be performed. In general, a sequential design uses the 
results of one set of experiments (i.e., the y that are actually observed) to help plan 
the next set of experiments. In one possible approach — in fact, a greedy approach — 
an optimal experiment is initially computed and carried out, and its data are used 
to perform inference. The resulting posterior p(0,y|di) is then used as the prior 
p{0) in the design of the next experiment d2, and the process is repeated. This 
approach is not necessarily optimal over a horizon of many experiments, however. A 
more rigorous treatment would involve formulating the sequential design problem as 
a dynamic programming problem, but this is beyond the scope of the present paper. 
Intuitively, a sequential experimental design should be at least as good as a fixed 
design, due to the extra information gained in the intermediate stages. 

2. 3. Numerical evaluation of the expected utility 

Typically, the expected utility in Equation 6 has no closed form and must be 
approximated numerically. One approach is to rewrite U (d) as 



where the second equality is due to the application of Bayes' theorem to the quantities 
both inside and outside the logarithm. In the special case where the Shannon entropy 
of p(y,0|d) is independent of the design variables d, the first term in Equation 8 
becomes constant for all designs [32] and can be dropped from the objective function. 
Maximizing the remaining term — which is the entropy of p(y |d) — is then equivalent to 



Uid) = J^Jjie\y,d)\n ^^j^ dep{y\d)dy 

= / / {\n[p{y\e,d)]-\n[p{y\d)]}p{y\e,d)p{e)dedy, 




Jy J& 
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the maximum entropy sampling approach of Sebastiani and Wynn [1 1] . Here we retain 
the more general formulation of Equation 8 in order to accommodate, for example, 
likelihood functions containing a measurement error whose magnitude depends on y 
or d. 

Monte Carlo sampling can then be used to estimate the integral in Equation 8 

U{d) ^ ^ y; {In [p(y«|0«,d)] - In [p(y«|d)] } , (9) 

where 0*^*^ are drawn from the prior p{0); y*-*^ are drawn from the conditional dis- 
tribution p{y\0 = 0^^\d) (i.e., the likelihood); and nout is the number of samples in 
this "outer" Monte Carlo estimate. The evidence evaluated at y*^*\ p{y^^^\d), typi- 
cally does not have an analytical form, but it can be approximated using yet another 
importance sampling estimate: 

p(y«|d) = / p(y«|0,d)p(0)d0 
J& 

^ p(yW|d) = — 5^p(y«|0(^'^-),d), (10) 

^in ■ , 

where 0^''^'^ are drawn from the prior p{6) and is the number of samples in this 
"inner" Monte Carlo sum. The combination of Equations 9 and 10 yields a bi- 
ased estimator f/(d) of f/(d) [14]. The variance of this estimator is proportional 
to 74(d)/?7,out + -B(d)/nout^in, where A and B are terms that depend only on the dis- 
tributions at hand. The bias is proportional to C{d)/n\-a [14]. Hence n^^ controls the 
bias while riout controls the variance. 

Evaluating and sampling from the likelihood for each new sample of 6 constitutes 
the most significant computational cost above (see Section 3). In order to mitigate 
the cost of the nested Monte Carlo estimator, we draw a fresh batch of prior samples 
{^*'^''}fc=i every d, and use this set for both the outer Monte Carlo sum (i.e., 
Q{i) ^ Q(k)^^ QT^f^ ^]^g inner Monte Carlo estimates at that d (i.e., ^^''-'^ = 0'^^\ 
and consequently rZout = ''^in)- This treatment reduces the computational cost for a 
fixed d from O (nout'^in) to O (riout)- hi practice, sample reuse also avoids producing 
near-zero evidence estimates (and hence infinite values for the expected utility) at 
small sample size. The reuse of samples adds bias, but this effect is very small [33]. 

2.4- Stochastic optimization 

Now that the expected utility U{d) can be estimated at any value of the design 
parameters, we turn to the optimization problem (7). Maximizing U via a grid search 
over T> is clearly impractical, since the number of grid points grows exponentially 
with dimension. Since only a Monte Carlo estimate U{d) of the objective function 
is available, another naive approach would be to use a large sample size (nout,''^in) 
at each d and then apply a deterministic optimization algorithm, but this is still 
too expensive. (And even with large sample sizes, U{d) is effectively non-smooth.) 
Instead, we would like to use only a few Monte Carlo samples to evaluate the objective 
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at any given d, and thus we need algorithms suited to noisy objective functions. Two 
such algorithms are simultaneous perturbation stochastic approximation (SPSA) and 
the Nelder-Mead nonlinear simplex (NMNS). 

SPSA, proposed by Spall [34, 35], is a stochastic approximation method that has 
received considerable attention [36]. The method is similar to a steepest-descent 
method using finite difference estimates of the gradient, except that SPSA only uses 
two random perturbations to estimate the gradient regardless of the problem's di- 



mension: 



gfc(dA 



U (dfc + Cfc Afc) - U (dfc - Cfc Afc) 
2cfc 



A.I 



Afc,nd 



(11) 

(12) 



where k is the iteration number, 



ak 



(A + k + l) 



Ck 



(k + l) 



7 ' 



(13) 



and a, A, a, c, and 7 are algorithm parameters with recommended values available, 
e.g., in [35]. is a random vector whose entries are i.i.d. draws from a symmetric 
distribution with finite inverse moments [34]; here, we choose A^ j ~ Bernoulli (0.5). 
Common random numbers are also used to evaluate each pair of estimates f/(dfc + 
Cfc Afc) and U{dk — Ck^k) at a given d^, in order to reduce variance in estimating the 
gradient [37]. 

An intuitive justification for SPSA is that error in the gradient "averages out" over 
a large number of iterations [34]. Convergence proofs with varying conditions and 
assumptions can be found in [38, 39, 40]. Randomness introduced through the noisy 
objective U and the finite-difference-like perturbations allows for a global convergence 
property [41]. Constraints in SPSA are handled by projection: if the current position 
does not remain feasible under all possible random perturbations, then it is projected 
to the nearest point that does satisfy this condition. 

The Nelder-Mead nonlinear simplex algorithm [42] has been well studied and is 
widely used for deterministic optimization. The details of the algorithm are thus 
omitted from this discussion but can be found, e.g., in [42, 43, 44]. This algorithm 
has a natural advantage in dealing with noisy objective functions because it requires 
only a relative ordering of function values, rather than the magnitudes of differences 
(as in estimating gradients). Minor modifications to the algorithm parameters can 
improve optimization performance for noisy functions [43]. Constraints in NMNS are 
handled simply by projecting from the infeasible point to the nearest feasible point. 

There are advantages and disadvantages to both algorithms. SPSA is a gradient- 
based approach, taking advantage of any regularity in the underlying objective func- 
tion while requiring only two function evaluations per step to estimate the gradient 
instead of 2na evaluations, as with a full finite-difference scheme. However, a very 
high noise level can lead to slow convergence and cause the algorithm to stagnate in 
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local optima. NMNS is relatively less sensitive to the noise level, but the simplex can 
be unfavorably distorted due to the projection treatment of constraints, leading to 
slow or false convergence. 

Using either of the algorithms described in this section, we can solve the stochastic 
optimization problem posed in Equation 7 and obtain the best experimental design. 
In a sense, this completes the experimental design phase of Figure 1. But a remaining 
difficulty is one of computational cost. Even with an effective Monte Carlo estimator 
of the expected utility, and with efficient algorithms for stochastic optimization, the 
complex physical model embedded in Equation 9 still must be evaluated repeatedly, 
over many values of the model parameters and design variables. Methods for making 
this task more tractable are discussed in the next section. 

3. Polynomial Chaos Surrogate 

Expensive physical models can render the evaluation and maximization of ex- 
pected information gain impractical. Models enter the formulation through the like- 
lihood function p(y|0,d). For example, a simple likelihood function might allow for 
an additive discrepancy between experimental observations and model predictions: 

y = G(0,d) + e. (14) 

Here, e is a random variable with density p^; we leave the form of this density non- 
specific for now. The "forward model" of the experiment is G : M"" x W^'^ — > W^y; it 
maps both the design variables and the parameters into the data space. Drawing a 
realization from p(y |0, d) thus requires evaluating G at a particular (^, d). Evaluating 
the density p(y d) = Peiy — G{0, d)) again requires evaluating G. 

To make these calculations tractable, one would like to replace G with a cheaper 
"surrogate" model that is accurate over the entire prior support and the entire 
design space V. Many options exist, ranging from projection-based model reduc- 
tion [45, 46] to spectral methods based on polynomial chaos (PC) expansions [47, 
22, 23, 48, 49, 50, 51]. The latter approaches do not reduce the internal state of a 
deterministic model; rather, they explicitly exploit regularity in the dependence of 
model outputs on uncertain input parameters. Polynomial chaos has seen extensive 
use in a range of engineering applications (e.g., [52, 53, 54, 55]) including parameter 
estimation and inverse problems (e.g., [56, 57, 58]), and this is the approach we shall 
use. 

Let : — J- M be i.i.d. random variables defined on a probability space {Q, J-", P), 
where Q is the sample space, J-" is the a-field generated by all the ^j, and P is the 
probability measure. Then any random variable ^ : f2 — )■ M, measurable with respect 
to {fljJ-') and possessing finite variance, 6 G L^(fi,P), can be represented as follows: 

oo 

^(u;) = 5^^iVl/j(^,(u;),^2(c^),...), (15) 

|i|=0 

where a; G is an element of the sample space; i = {ii, 22, ■■■)■, ij & Nq, is an infinite- 
dimensional multi-index; |i| = ii + 12 + . . . is the li norm; G M are the expansion 
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coefficients; and 



3=1 

are multivariate polynomial basis functions [23]. Here ipi- is an orthogonal polynomial 
of order ij in the variable where orthogonality is with respect to the distribution 

of 0' 

and S is the support of p{^). The expansion 15 is convergent in the mean-square 
sense [59] . For computational purposes, the infinite sum and infinite dimension must 
be truncated to some finite stochastic dimension and polynomial order. A common 
choice is the "total-order" truncation |i| < p: 

eico) ^ 5^^^iVi/j(^i,e2,...,aj (18) 

|i|<P 

ris 

The total number of terms in this expansion is 

/ ns+p\ {n,+p)\ 
npc = [ ] = r-j— • (20) 

\ P J Hslpl 

The choice of p is influenced by the degree of nonlinearity in the relationship between 
6 and ^j, and the choice of reflects the degrees of freedom needed to capture the 
stochasticity of the system. These choices might also be constrained by the availability 
of computational resources, as npc grows quickly when these numerical parameters 
are increased. 

3.1. Joint expansion for design variables 

In the Bayesian setting, the model parameters 6 are random variables, for which 
PC expansions are easily applied. But the model outputs also depend on the design 
conditions, and constructing a separate PC expansion at each value of d required 
during optimization would be impractical. Instead, we can construct a single PC ex- 
pansion for each component of G, depending jointly on 6 and d. (Similar suggestions 
have recently appeared in the context of robust design [60].) To proceed, we increase 
the stochastic dimension by the number of design dimensions, putting Ug = ne + n^, 
where we have assigned one stochastic dimension to each component of and one 
to each component of d for simplicity. Further, we assume an affine transformation 
between each component of d and the corresponding {^i}"=„g+i; any value of d can 
thus be uniquely associated with a vector of these ^j. Since the design parameters 
will usually be supported on a bounded domain (e.g., inside some hyper-rectangle) 
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the corresponding are given uniform distributions. (The corresponding univariate 
ipi are thus Legendre polynomials.) These distributions effectively define a uniform 
weight function over the design space P that governs where the L^-convergent PC 
expansions should be accurate. 

3.2. Pseudospectral projection 

Constructing the PC expansion involves computing the coefficients Oi, this gen- 
erally can proceed via two alternative approaches, intrusive and non-intrusive. The 
intrusive approach results in a new system of equations that is larger than the origi- 
nal deterministic system, but it can be solved only once. The difficulty of this latter 
step depends strongly on the character of the original equations, however, and may 
be prohibitive for arbitrary nonlinear systems. The non-intrusive approach computes 
the expansion coefficients by directly projecting the quantity of interest (e.g., the 
model output) onto the basis functions {^E'i}. One advantage of this method is that 
the deterministic solver can be reused and treated as a black box. The deterministic 
problem needs to be solved many times, but typically at carefully chosen parameter 
values. The non-intrusive approach also offers flexibility in choosing arbitrary func- 
tional of the state trajectory as observables; these functionals may depend smoothly 
on ^ even when the state itself has a less regular dependence. (The combustion model 
in Section 6 provides an example of such a situation.) 

Taking advantage of orthogonality, the PC coefficients are simply: 

where is the PC coefficient with multi-index i for the cth observable.'^ Analytical 
expressions are available for the denominators but the numerators must be 

evaluated via numerical quadrature, because of the forward model G. The resulting 
approach is termed pseudospectral projection, or non-intrusive spectral projection 
(NISP). When the evaluations of the integrand (and hence the forward model) are 
expensive and Ug is large, an efficient method for high- dimensional integration is 
essential. 

3.3. Dimension- adaptive sparse quadrature 

A host of useful methods are available for numerical integration [61, 62, 63, 64, 
65, 66]. In the present context, we seek a method that can evaluate the numerator of 
Equation 21 efficiently in high dimensions, i.e., with a minimal number of integrand 
evaluations, taking advantage of regularity and anisotropy in the dependence of G on 
and d. We thus employ the dimension-adaptive sparse quadrature (DASQ) algo- 
rithm of Gerstner and Griebel [24], an efficient extension of Smolyak sparse quadrature 



^Here we are equating the dimension of the forward model output with the number of observables 
Uy. If the data contain repeated observations of the same quantity, for instance, in the case of multiple 
experiments, then the same PC approximation can be used for all model-based predictions of that 
quantity. 
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that adaptively tensorizes quadrature rules in each coordinate direction. It has a weak 
dependence on dimension, making it an excellent candidate for problems of moderate 
size (e.g., < 100). Its formulation is briefly described below. 
Let 

QS'V = E^*/(^^) (22) 

i=l 

be the Ith level (with rii quadrature points) of some univariate quadrature rule, where 
Wi are the weights and /(^) is the integrand. The level is usually defined to take 
advantage of any nestedness in the quadrature rule and to reduce the overall com- 
putational cost. We have chosen to use Clenshaw-Curtis (CC) quadrature, with the 
following level definition 

ni = 2'-^ + 1, / > 2, ni = l. (23) 

The CC rule is especially appealing because it is accurate,^ nested, and easy to 
construct.^ 

The difference formulas, defined by 

A,/ = {Qi'^ - Q^l,)f, Q'i^f = 0, (24) 

are the differences between ID quadrature rules at two consecutive levels. The sub- 
traction is carried out by subtracting the weights at the quadrature points of the 
lower level. Then, for k G Nf (where each entry of the multi-index k represents the 
level in that dimension, with a total of d dimensions), the multivariate quadrature 
rule is defined to be 

Q/ = 5^(Afc,®---®A,J/, (25) 

k</C 

where K, is some set determined by the adaptation algorithm, to be described below. 
For example, /C : |k|i < L+d—1 where L is some user-defined level, corresponds to the 
Smolyak sparse quadrature, while /C : |k|oo < n + 1 corresponds to a tensor-product 
quadrature. 

The original DASQ algorithm can be found in [24]. The idea is to divide all the 
multi-indices k into two sets: an old set and an active set. A member of the active 
set is able to propose a new candidates by increasing the level in any dimension by 1. 



Although Gauss-Legendre quadrature (which is not nested) has a higher degree of polynomial 
exactness, [67] notes that 

"the Clenshaw-Curtis and Causs formulas have essentially the same accuracy unless f 
is analytic in a sizable neighborhood of the interval of the integration — in which case 
both methods converge so fast that the difference hardly matters. " 

^The abscissae are simply Xi = cos (■^), and the weights can be computed very efficiently via 
FFT [68, 69[, requiring only O (nlogn) time and introducing very little roundoff error. 



13 



However, the candidate can only be accepted if all its backward neighbors are in the 
old set; this so-called admissibility condition ensures the validity of the telescoping 
expansion of the general sparse quadrature formulas via the differences A^. Finally, 
each multi-index has an error indicator, which is proportional to its corresponding 
summand value in Equation 25. Intuitively, if this term contributes little to the 
overall integral estimate, then integration error due to this term should be small. 
New candidates are proposed from the multi-index corresponding to the highest error 
estimate. The process iterates until the sum of error indicators for the active set 
members falls below some user-specified tolerance. More details, including proposed 
data structures for this algorithm, can be found in [24]. One drawback of DASQ is 
that parallelization can only be implemented within the evaluation of each k, which 
is not as efficient as the parallelization in non- adaptive methods. The original DASQ 
algorithm also does not address how integrand evaluations at nested quadrature points 
can easily be identified and reused as adaptation proceeds. Huan [33] proposes an 
algorithm to solve this problem, taking advantage of the specific quadrature structure. 

The ultimate goal of quadrature is to compute the polynomial chaos coefficients 
of the model outputs in Equation 21. There are a total of npc (see Equation 20) 
coefficients for each output variable, and a total of Uy model outputs, yielding a total 
of rZcoef = npcUy integrals. To simplify notation, let the PC coefficients Gc,i, c = 
1 . . .Uy, \i\ < p, he re- indexed by G^, = 1 • • • ''^coef- It would be very inefficient to 
compute each integral from scratch, since the corresponding quadrature points will 
surely overlap and any evaluations of G{0{$,), d{^)) ought to be reused. To realize this 
computational savings, the original DASQ algorithm is altered to integrate for all the 
coefficients Gr simultaneously. We guide all the integrations via a single adaptation 
route, which uses a "total effect" local error indicator /ik that reflects all the local 
error indicators /i^.k from the integrals. The total effect indicator at a given k may 
be deflned as the max or 2-norm of the local error indicators {/ir,k}r=T'^- The new 
algorithm is presented as Algorithm 1. 

Lastly, compensated summation (the Kahan algorithm [70]) is used throughout 
our implementation, as it signiflcantly reduces numerical error when summing long 
sequences of flnite-precision floating point numbers as required above. 

4. Bayesian Parameter Inference 

Once data are collected by performing an optimal experiment, they can be used 
in the manner specifled by the original experimental goal. In the present case, the 
goal is to infer the model parameters 6 by exploring or characterizing the posterior 
distribution in Equation 1. Ideally the data will lead to a narrow posterior such that, 
with high probability, the parameters can only take on small range of values. 

The posterior can be evaluated pointwise up to a constant factor, but comput- 
ing it on a grid is immediately impractical as the number of dimensions increases. 
A more economical method is to generate independent samples from the posterior, 
but given the arbitrary form of this distribution (particularly for nonlinear G), di- 
rect Monte Carlo sampling is seldom possible. Instead, one must resort to Markov 
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chain Monte Carlo (MCMC) sampling. Using only pointwise evaluations of the un- 
normalized posterior density, MCMC constructs a Markov chain whose stationary 
and limiting distribution is the posterior. Samples generated in this way are cor- 
related, such that the effective sample size is smaller than the number of MCMC 
steps. Nonetheless, a well-tuned MCMC algorithm can be reasonably efficient. The 
resulting samples can then be used in various ways — to evaluate marginal posterior 
densities, for instance, or to approximate posterior expectations 



where 0*^*^ are samples extracted from the chain (perhaps after burn-in or thinning). 
For example, the minimum mean square error (MMSE) estimator is simply the mean 
of the posterior, while the corresponding Bayes risk is the posterior variance, both of 
which can be estimated using MCMC. 

A very simple and powerful MCMC method is the Metropolis-Hastings (MH) algo- 
rithm, first proposed by Metropolis et al. [71], and later generalized by Hastings [72]; 
details of the algorithm can be found in [73, 74, 75, 76]. Two useful improvements 
to MH are the concepts of delayed rejection (DR) [77, 78] and adaptive Metropolis 
(AM) [79]; combined these comprise the DRAM algorithm of Haario et al. [80]. While 
countless other MCMC algorithms exist or are under active development, some in- 
volving derivatives (e.g., Langevin) or even Hessians of the posterior density, DRAM 
offers a good balance of simplicity and efficiency in the present context. 

Even with efficient proposals, MCMC typically requires a tremendous number of 
samples (tens of thousands or even millions) to compute posterior estimates with 
acceptable accuracy. Since each MCMC step requires evaluation of the posterior 
density, which in turn requires evaluation of the likelihood and thus the forward 
model G, surrogate models for the dependence of G on can offer tremendous 
computational savings. Polynomial chaos surrogates, as described in Section 3, can 
be quite helpful in this context [56, 57, 58]. 

5. Application: Nonlinear Model 

We first illustrate the optimal design of experiments using a simple algebraic 
model, nonlinear in both the parameters and the design variables. Since the model 
is inexpensive to evaluate, we use it to illustrate features of the core formulation — 
estimating expected information gain, designing single or multiple experiments, and 
the role of prior information — leaving demonstrations of stochastic optimization and 
polynomial chaos surrogates to the next section. 




(26) 



with the n^-sample average 
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5.1. Design of a single experiment 

Consider a simple nonlinear model with a scalar observable one uncertain pa- 
rameter 9, and one design variable d: 

y{e,d) = G{e,d) + e 

= e^d^ + eexp{-\0.2-d\) + e, (28) 

where G{6, d) denotes the model output (without noise) and e ~ 7\/'(0, 10"'*) is an 
additive Gaussian measurement error. Let the prior be ^ ~ W(0, 1) and the design 
space he d E [0, 1]. 

Suppose our experimental goal is to infer the uncertain parameter 9 based on 
a single measurement y. The expected utility U{d) in Equation 6 and its sample 
estimate tj{d) in Equation 9 are appropriate choices, and our ultimate goal is to 
maximize U{d). Figure 2(a) shows estimates of the expected utility, using rZout = 
riin = 10^, plotted along a 101-node uniform grid spanning the entire design space. 
Local maxima appear ai d = 0.2 and d = 1.0, a pattern which can be understood by 
examining Equation 28. A d value away from 0.2 or 1.0 (such as d = 0) would lead 
to an observation y that is dominated by the noise e, which is not useful for inferring 
the uncertain parameter 6. But if d is chosen close to 0.2 or 1.0, such that the noise 
is insignificant compared to the first or second term of the equation, then y would be 
very informative for 6. 

5.2. Design of two experiments 

Consider the "batch" or fixed design of two experiments (where the results of 
one experiment cannot be used to design the other, as described in Section 2.2). 
Moreover, assume that both experiments are described by the same model; this is 
not a requirement, but an assumption adopted here for simplicity. Then the overall 
algebraic model is simply extended to 

y(0,d) = G(^,d) + e 

yi{.o,di) 
_ y2iO,d2) 

where the subscripts -i and -2 denote variables associated with experiments 1 and 2, 
respectively. Note that there is still a single common parameter 6. The errors ei and 
62 are i.i.d. with ei, 62 ~ A^(0, lO""*). 

Again using a W(0, 1) prior on 6, the expected utility is plotted in Figure 3(a). 
First, note the symmetry in the contours along the di = ^2 line, which is expected 
since the two experiments have identical structure. Second, the optimal pair of exper- 
iments is not just a repeat of the optimal single-experiment design: d* 7^ (1, 1), and 
instead we have d* = (0.2, 1.0) or (1.0,0.2). Some insight can be obtained by exam- 
ining Figure 4, which plots the single-experiment model output G{6, d) as a function 
of the uncertain parameter 9 at the two locally optimal designs: d = 0.2 and d = 1. 
Intuitively, a high slope of G should be more informative for the inference of ^, as the 
output is then more sensitive to variations in the input. The plot shows that neither 



e^di + eexp{-\0.2-di\) 

e^dl + eexp{-\o.2-d2\) 



+ 



ei 

£2 
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design has a greater slope over the entire range of the prior 9 ~ W(0, 1). Instead, the 
slope is greater for 9 G [0,^e] with design d = 0.2, and greater for 9 G [9e, 1.0] with 



Let us then examine the cases of "restricted" priors 9 ~ W(0, 9e) and 9 ~ U{9e, 1). 
Expected utilities for a single experiment, under either of these priors, are shown in 
Figures 2(b) and 2(c). The optimal design for U{0,9e) is at 0.2 and for U{9e, 1) it is 
at 1.0, supporting intuition from the analysis of slopes. Next, the expected utilities 
of two experiments, under the restricted priors, are shown in Figures 3(b) and 3(c). 
Since in both cases, a single design point can give G maximum slope over the entire 
restricted prior range of 9, it is not surprising that the optimal pair of experiments 
involves repeating the respective single-experiment optima. In contrast, the lack of 
a "clear winner" over the full prior U{0, 1) intuitively explains why a combination 
of different design conditions may yield more informative data overall. Note that 
we have only focused on the two local optima d = 0.2 and d = 1 from the original 
9 ~ W(0, 1) analysis, but it is possible that new local or globally optimal design points 
could emerge as the prior is changed. 

6. Application: Combustion Kinetics 

Experimental diagnostics play an essential role in the development and refinement 
of chemical kinetic models for combustion [81, 82]. Available diagnostics are often 
indirect, imprecise, and incomplete, leaving significant uncertainty in relevant rate 
parameters and thermophysical properties [83, 84, 53, 85]. Uncertainties are partic- 
ularly acute when developing kinetic models for new combustion regimes or for fuels 
derived from new feedstocks, such as biofuels. Questions of experimental design — e.g., 
which species to interrogate and under what conditions — are thus of great practical 
importance in this context. 

6.1. Model description 

We demonstrate our optimal experimental design framework on shock tube ig- 
nition experiments, which are a canonical source of kinetic data. In a shock tube 
experiment, the mixture behind the reflected shock experiences a sharp rise in tem- 
peature and pressure; if conditions are suitable, this mixture then ignites after some 
time, known as the ignition delay time. Ignition delays and other observables ex- 
tracted from the experiment carry indirect information about the elementary chemi- 
cal kinetic processes occurring in the mixture. These experiments are well described 
by the dynamics of a spatially homogeneous, adiabatic, constant-pressure chemical 
mixture. 

We model the evolution of the mixture using ordinary differential equations (ODEs) 
expressing conservation of energy and of individual chemical species. Governing equa- 
tions are detailed in Appendix B. We consider an initial mixture of hydrogen and 
oxygen. (Note that H2-O2 kinetics are a key subset of the reaction mechanisms as- 
sociated with the combustion of complex hydrocarbons.) Our baseline kinetic model 
is a 19-reaction mechanism proposed in [86], reproduced in Table 1. Detailed chem- 
ical kinetics lead to a stiff set of nonlinear ODEs, with state variables consisting 



design d = 1.0, where 9^ 




0.4373. 
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of temperature and species mass or molar fractions. The initial condition of the 
system is specified by the initial temperature Tq and the fuel-oxidizer equivalence 
ratio 0. Species production rates depend on the mixture conditions and on a set of 
kinetic parameters: pre-exponential factors Am, temperature exponents bm, and ac- 
tivation energies Ea^m, where m is the reaction number in Table 1. These parameters 
are important in determining combustion characteristics and are of great interest in 
practice. Thermodynamic parameters and reaction rates in the governing equations 
are evaluated with the help of Cantera 1.7.0 [87, 88], an open-source chemical kinet- 
ics software package. ODEs are solved implicitly, using the variable-order backwards 
differentiation formulas implemented in CVODE [89]. 

6.2. Experimental goals 

In this study, the experimental goal is to infer selected kinetic parameters {Am, bm, 
and Ea^m) associated with the elementary reactions in Table 1. For demonstration, we 
let the kinetic parameters of interest be Ai and -Ea,3- Reaction 1 is a chain-branching 
reaction, leading to a net increase in the number of radical species in the system. 
Reaction 3 is a chain-propagating reaction, exchanging one radical for another, but 
nonetheless relevant to the overall dynamics.^ We infer ln{Ai/Ai) rather than Ai 
directly, where A^ is the nominal value of Ai in [86]; this transformation ensures 
positivity and let us easily impose a log- uniform prior on Ai, which is appropriate 
since the pre-exponential is a multiplicative factor. The design variables are the initial 
temperature Tq and equivalence ratio 0. 

We once again use the expected utility defined in Equation 6 (and its estimator in 
Equation 9) as the objective to be maximized for optimal design. Unlike the algebraic 
model in Section 5, however, this combustion problem offers many possible choices of 
observable. Some observables are informative than others; we explore this choice in 
the next section. 

6.3. Observables and likelihood function 

Typical trajectories of the state variables are shown in Figure 5. The tempera- 
ture rises suddenly upon ignition; reactant species are rapidly consumed and product 
species are produced as the mixture comes to equilibrium. The most complete and 
detailed set of system observables are the state variables as a function of time. One 
could simply discretize the time domain to produce a finite-dimensional data vector 
y. Too few discretization points might fail to capture the state behaviour, however. 
And because the kinetic parameters affect ignition delay, the state at any given time 
may have a nearly discontinuous dependence on the parameters. (This is due to the 
sharpness of the ignition front; at a fixed time, the state is most probably either 
pre-ignition or post-ignition.) Such a dependence makes construction of a polyno- 
mial chaos surrogate far more challenging [90]. It is desirable to transform the state 



^As the methodology explored here is quite general, we have the freedom to select any parameters 
appearing in the mechanism. The selection reflects the particular goals of the experimentalist or 
investigator. We also note that the "evaluated" combustion kinetic data in [83, 84] can help select 
parameters to target for inference and help define their prior ranges. 



18 



into alternate observables that somehow "compress" the information and that depend 
relatively smoothly on the kinetic parameters, while retaining features that are po- 
tentially relevant to the experimental goals. We would also like to select observables 
that are easy to obtain experimentally. 

Taking the above factors into consideration, we will use the observables in Table 2. 
The observables are the peak value of the heat release rate, the peak concentrations 
of various intermediate chemical species (O, H, HO2, H2O2), and the times at which 
these peak values occur. Examples of Xj^^, th, ^|^- and Xh,t are shown in Figure 6. 
The time of peak heat release coincides with the time at which temperature rises 
most rapidly. We thus take it as our definition of ignition delay, rjg„. We use the 
logarithm of all the characteristic time variables in our actual implementation, as 
the times are positive and vary over several orders of magnitude as a function of the 
kinetic parameters and design variables. 

The likelihood is defined using the ODE model predictions and independent addi- 
tive Gaussian measurement errors: y = G{0, d) + e, with components ec ~ A^(0, o"^). 
For the concentration observables, the standard deviation of the measurement error 
is taken to be 10% of the value of the corresponding signal: 

=O.ia(0,d). (30) 

For the characteristic-time observables, we add a small constant a = 10~^ s to the 
standard deviation, reflecting the minimum resolution of the timing technology: 

al = 0.1Gc{0,d) + a. (31) 

Note that the noise magnitude depends implicitly on both the kinetic parameters 
and the design variables. Both terms contributing to the expected information gain 
in Equation 8 are therefore influential, and one would expect a maximum entropy 
sampling approach to yield different results than the present experimental design 
methodology. 

6.4- Polynomial chaos construction 

Each solve of the ODE system defining G{0, d) is expensive, and thus we employ 
a polynomial chaos surrogate. In practice, since non-intrusive construction of the sur- 
rogate requires many forward model evaluations, the surrogate is only worth forming 
if the total number of model evaluations required for optimization of the expected 
utility exceeds the number required for surrogate construction. A detailed analysis of 
this tradeoff and the potential computational gains can be found in Section 6.7. 

Uniform priors are assigned to the model parameters 6 = [ln{Ai/A^),Ea^3] and 
uniform input distributions are assumed for the design variables [Tq, (p] (see Sec- 
tion 3.1), with the supports given in Table 3. The polynomial chaos expansion thus 
uses Legendre polynomials, with ^1, ^2, ^3, ^4 ~ W(~l; !)• Our goal now is to construct 
PC expansions for the model outputs G{0,d) = (Inxj^^, Inro, Inr//, lnrH02;lii''"H2 02; 

, Xo,T, Xh^t-, Xho2,t, Xh202,t), using the projection given in Equation 21. For 
each desired PC coefficient, the numerator in that equation is evaluated using the 
modified DASQ algorithm described in Section 3.3. The expansions are truncated 
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at a total order of p = 12, and DASQ is stopped once a total of nq^ad = 10^ func- 
tion evaluations have been exceeded. The degree of this expansion is admittedly 
(and deliberately) chosen rather high. The performance of lower-order expansions is 
examined below and in Section 6.7. 

Indeed, the accuracy of the PC surrogate can be analyzed more rigorously by 
evaluating its relative error over a range of p and nquad values: 

Cc = — 7. ! c=l...ny. (32) 

For the cth observable, Gc is the output of the orig inal ODE model and G'?'"'!"-'^ is the 
corresponding PC expansion. 6 and d are affine functions of ^ (the PC expansions 
of the model inputs). Accurately evaluating the error is expensive, certainly more 
expensive than computing (7^'"''"'"* jn the first place. But additional integration error 
must be minimized, and the integrals in Equation 32 are thus evaluated using a level- 
15 isotropic Clenshaw-Curtis sparse quadrature rule, containing 3,502,081 distinct 
abscissae. 

Figure 7 shows contours of log^g of the error over a range of p and riquad, for the 
PC expansion of the peak enthalpy release rate. The nquad values are approximate, 
as DASQ is terminated at the end of the iteration that passes nquad- When nquad is 
too small, the error is dominated by aliasing (integration) error and increases with 
p. When a sufficiently large riquad is used such that truncation error dominates, 
exponential convergence with respect to p can be observed, as expected for smooth 
functions. Ideally, nquad and p should be selected at the "knees" of these contour 
plots, since little accuracy can be gained when nquad is increased further, but these 
locations can be difficult to pinpoint a priori. 

6. 5. Design of a single experiment 

Figures 8(a) and 8(b) show contours of the expected utility estimates in the two- 
dimensional design space, ?7(d), constructed using the full ODE model (with estima- 
tor parameters nin = nout = 10^) and the PC surrogate (with estimator parameters 
nin = nout = 10^), respectively. Contours from the PC surrogate are very similar to 
those from the full model, though the former have less variability due to the larger 
number of Monte Carlo samples used to compute f/. Most importantly, both plots 
yield the same optimal experimental design at around (Tq,(^*) = (975,0.5). 

To test how well the expected information gain anticipates the performance of an 
experiment, the inference problem is solved at three different design points A, i?, and 
C, listed in Table 4 and illustrated in Figure 8(b). Since the expected utility is highest 
at design A, then the posterior is expected to reflect the largest information gain at 
that experimental condition. We use the full ODE model to generate artiflcial data 
at each of the three design conditions, then perform inference. Contours of posterior 
density are shown in Figure 9, using the full ODE model and the PC surrogate. 
The posteriors of the full model and PC surrogate match very well; hence the PC 
surrogate is suitable not only for experimental design, but for inference. As expected 
from the expected utility plots, the posterior distribution of the kinetic parameters 
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is tightest at design A; this was the most informative of the three experimental 
conditions. Posterior modes of the ODE model and PC results are not precisely the 
same, however, due to the modeling error associated with the PC surrogate. Also, the 
posterior modes obtained with the full ODE model do not exactly match the values 
used to generate the artificial data, due to the noise in the likelihood model. 

What if a different set of observables are used? Two cases are explored: first, 
using only the characteristic time observables (i.e., the first five rows of Table 2); and 
second, using only the peak value observables (i.e., the last five rows of Table 2). The 
corresponding expected utility plots are given in Figure 10 (using the PC surrogate 
only). Several remarks can be made. First, the characteristic time observables are 
more informative than the peak value observables, as demonstrated by the higher 
expected utility values in Figure 10(b) than Figure 10(c). Second, the choice of 
observables can greatly influence the optimal value of the design parameters. Third, 
even though the observables from the two cases form a partition of the full observable 
set, their expected utility values do not simply sum to that of the full-observable 
case. (This is a special case of the analysis in Appendix A.) The lesson is that the 
selection of appropriate observables is a very important part of the design procedure, 
especially if one is forced to select only a few modes of observation. This selection 
could be made into an argument of the objective function, augmenting d and leading 
to a mixed-integer optimization problem. 

6.6. Design of two experiments: stochastic optimization 

Now we perform a study analogous to that in Section 5.2, designing two ignition 
experiments (of the same structure) simultaneously. The experimental goal of in- 
ferring Ai and £^a,3 is unchanged, and for computational efficiency we use only the 
p = 12, nquad = 10^ PC surrogate. The design space is now four- dimensional, with 
d = [To, 1, 01, To 2, 02]- Stochastic optimization is used to find the optimal experimen- 
tal design, clS cl grid search is entirely impractical. 

Coupling stochastic optimization schemes (Section 2.4) with the estimator U{d) 
of expected information gain introduces a few new numerical tradeoffs. The number 
of samples in the outer loop of the estimator controls the variance of U, which dictates 
the noise level of the objective function. Lower noise in the objective function might 
imply fewer optimization iterations overall, while a noisier objective may require many 
more iterations of either SPSA or NMNS to make progress towards the minimum. On 
the other hand, noise should not be reduced too much for SPSA, since the usefulness 
of its gradient approximation relies on the existence of a non-negligible noise level. 
In general, the task of balancing riout against the number of optimization iterations, 
in order to minimize the number of model evaluations, is not trivial. We therefore 
test different values of riout to understand their impact on the SPSA and NMNS 
optimization schemes. We fix at 10^ in order to maintain a low bias contribution 
from the inner loop. 

Since the results of stochastic optimization are themselves random, we use an 
ensemble of 100 independent optimization runs at any given parameter setting to an- 
alyze performance. Each optimization run is capped at nnoisyObj = 10^ evaluations of 
the noisy objective — i.e., of the summand in Equation 9 — where each noisy objective 
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evaluation itself involves evaluations of the model G or the surrogate G^. We can 
thus compare performance at a fixed computational cost. 

Runs are performed for both SPSA and NMNS, with ?7,out ranging from 1 to 100. 
The final design conditions and convergence histories (the latter plotted for 10 runs 
only) are shown in Figures 11 and 12, respectively. In Figure 11, each connected 
blue cross and red circle represent a pair of final experimental designs, where the red 
circle is arbitrarily chosen to represent the lower To design. The optimization results 
indicate that both experiments should be performed near Tq = 975 K, although the 
best is less precisely determined and less influential. This pattern is similar to that 
of a single-experiment optimal design. Overall, a tighter clustering of the final design 
points is observed as riout is increased. SPSA groups the majority of the final design 
points more tightly, but it also yields more outliers than NMNS; in other words, it 
results in more of a "hit-or-miss" situation. Figure 11 indicates that, for the NMNS 
lower riout lets the algorithm reach the convergence "plateau" more quickly. 
This result is affected both by the shrinkage rate of the simplex and by the fact that 
a higher nout simply requires more model evaluations even to complete one iteration. 
The choice of nout then should take into consideration how many model evaluations 
are available. Even then, the best choice may depend on the shape of the expected 
utility surface and the variance of its estimator (which is not stationary in d). 

Our observations so far are based on an assessment of the design locations and 
convergence history. A more quantitative analysis should focus on the expected util- 
ity value of the final designs, which is what really matters in the end. Figure 13 
shows histograms of expected utility for the 100 final design points resulting from 
each optimization case. To compare the design points, we want to make the error 
incurred in estimating U{d*) relatively negligible, and thus we employ a high-quality 
estimator with riout = "^m = 10^. This is not the small-sample estimator used inside 
the optimization algorithms; it is a more expensive estimator used afterwards, only 
for diagnostic purposes. The histograms indicate that increasing riout is actually not 
very effective for SPSA, as the persistence of outliers creates a spread in the final val- 
ues, supporting our suspicion that too small a noise level may be bad for SPSA. On 
the other hand, increasing nout is effective for NMNS. The bimodal structure of the 
histograms is due to the two groups: good designs on the center "ridge" and outliers, 
with few designs having expected utility values in between. Overall, NMNS performs 
better than SPSA in this study, both in terms of the asymptotic distribution of design 
parameters and how quickly the convergence plateau or "knee" is reached. 

To validate the results, the parameter inference problem is solved with data from 
three two-experiment design points (labeled D, E, and F) and with data from a four- 
experiment factorial design. All of the experimental conditions are listed in Table 5. 
Design D, a pair of experiments lying on the ridge of "good designs," is expected to 
have the tightest posterior among the two-experiment designs. The posteriors are 
shown in Figure 14, using the PC surrogate only. Indeed, design D has the tightest 
posterior, and is much better than the four-experiment factorial design even though it 
uses fewer experiments! The factorial design blindly picks all the corners in the design 
space, which are in general not good design points. (The number of experiments 
in a factorial design would also increase exponentially with the number of design 
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parameters, becoming impractical very quickly.) Model-based optimal experimental 
design is far more robust and capable than this traditional method. 

6.7. Is using a PC surrogate worthwhile? 

In producing the previous results, we used a PC surrogate with p = 12 and riquad = 
10^, though analysis for one observable in Figure 7 suggests that this polynomial 
truncation and adapted quadrature rule are perhaps of higher quality than necessary 
for our problem. To quantitatively determine whether a polynomial surrogate offers 
efficiency gains in this study (or any other), we must (i) check if a lower quality PC 
surrogate may be used while still achieving results of similar quality, and (ii) analyze 
the computational cost. 

To check if a lower-quality PC surrogate would suffice, the same two-experiment 
optimal experimental design problem is solved but now with p = 6. We find that 
'^quad = 10^ is roughly the smallest riquad that still yields reasonable results. The 
final optimization positions (obtained via NMNS only), the convergence history, and 
histogram of final expected utilities are shown in Figure 15. In fact, the histogram 
appears to be even tighter than that obtained with the p = 12 surrogate. 

To analyze the computational cost, we assume that optimization is terminated af- 
ter 5000 noisy objective function evaluations, as the practical convergence plateau is 
reached by that point. If each noisy objective function requires 10^ inner Monte Carlo 
samples and each PC evaluation is negligible compared to full model, then using the 
full ODE model requires 5000 x 10^ = 5 x 10^ model evaluations, whereas the con- 
struction of the PC surrogate requires 10^ full ODE model evaluations. The surrogate 
thus provides a speedup of roughly 3.5 orders of magnitude, saving 49,990,000 full 
model evaluations (roughly 4 months of computational time for the present problem, 
if run serially) . 

Even though a low-quality surrogate may be sufficient for optimal experimental 
design, it may not be sufficiently accurate for parameter inference. For example, the 
two-experiment and four-experiment posteriors obtained using the p = 6, Tiquad = 10^ 
surrogate are shown in Figure 16. These posterior density contours show a substantial 
loss of accuracy compared to the corresponding plots in Figure 14. Because inference 
does not involve averaging over the data space and broadly exploring the design space, 
and it because it generally favors a more restricted range of the model parameters 
6, it may be more sensitive to local errors than the optimal experimental design 
formulation. There are two possible solutions to this issue: 

1. Build and use a high-order polynomial chaos surrogate at the outset of analysis, 
and use it for both optimal experimental design and inference. Because inference 
typically employs MCMC and thus requires many thousands or even millions 
of model evaluations, the combined computational savings will make such a 
surrogate almost certainly worth constructing. 

2. A more efficient approach is to use a low-order polynomial chaos expansion to 
perform the optimal experimental design. Upon reaching the optimal design 
conditions, construct a new PC expansion for inference at that design only. The 
new expansion does not need to capture dependence on the design variables. 
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and thus it involves a smaller dimension and fewer interactions. This less ex- 
pensive local PC expansion can more easily be made sufficiently accurate for 
the inference problem. 

7. Conclusions 

This paper presents a systematic framework and a set of computational tools for 
the optimal design of experiments. The framework can incorporate nonlinear and 
computationally intensive models of the experiment, linking them to rigorous infor- 
mation theoretic design criteria and requiring essentially no simplifying assumptions. 
A flowchart of the overall framework is given in Figure 1 , showing the steps of optimal 
design and their role in a larger design-experimentation-model improvement cycle. 

We focus on the experimental goal of parameter inference, in a Bayesian statisti- 
cal setting, where a good design criterion is shown to maximize expected information 
gain from prior to posterior. A two-stage Monte Carlo estimator of expected infor- 
mation gain is coupled to algorithms for stochastic optimization. The estimation 
of expected information gain, which would otherwise be prohibitive with computa- 
tionally intensive models, is substantially accelerated through the use of flexible and 
accurate polynomial chaos surrogate representations. 

We demonstrate our method ffist on a simple nonlinear algebraic model, then 
on shock tube ignition experiments described by a stiff system of nonlinear ODEs. 
The latter system is challenging to approximate, as certain model observables depend 
sharply on the combustion kinetic parameters and design conditions, and ignition 
delays vary over several orders of magnitude. In both these examples, we illustrate the 
design of single and multiple experiments. We analyze the impact of prior information 
on the optimal designs, and examine the selection of observables according to their 
information value. We also investigate numerical tradeoffs between variance in the 
estimator of expected utility and performance of the stochastic optimization schemes. 

Overall we find that inference at optimal design conditions is indeed very infor- 
mative about the targeted parameters, and that model-based optimal experiments 
are far more informative than those obtained with simple heuristics. The use of 
surrogates offers significant computational savings over stochastic optimization with 
the full model, more than three orders of magnitude in the examples tested here. 
Moreover, we find that the polynomial surrogate used in optimal experimental design 
need not be extremely accurate in order to reveal the correct design points; surrogate 
requirements for optimal design are less stringent than for parameter inference. 

Several promising avenues exist for future work. More efficient means of con- 
structing polynomial chaos expansions, adaptively and in conjunction with stochastic 
optimization, may offer considerable computational gains. Uncertainty in the design 
parameters themselves can also be incorporated into the framework, as in real-world 
experiments where the design conditions cannot be precisely imposed; this additional 
uncertainty could be treated with a hierarchical Bayesian approach. Structural inade- 
quacy of the model G is another important issue; how successful is an "optimal" design 
(or indeed an inference process) based on a forward model that cannot fully resolve 
the physical processes at hand? Our current experience on design with lower-order 
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polynomial surrogates provides a glimpse into issues of structural uncertainty, but 
a much more thorough exploration is needed. Finally, the Bayesian optimal design 

methodology has a natural extension to sequential experimental design, where one set 
of experiments can be performed and analyzed before designing the next set. A rig- 
orous approach to sequential design, incorporating ideas from dynamic programming 
and perhaps sequential Monte Carlo, may be quite effective. 
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Appendix A: Expected Information Gain from Two Experiments 



Here we show that the expected information gain from two experiments is not, in 
general, equal to the sum of the expected information gains due to each experiment 
individually. 

Consider two fixed experimental conditions di and da, with corresponding data yi 
and y2. Following Equation 6, the expected information gain in the model parameters 
6 from performing both experiments is 
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where the second equality is due to application of Bayes' Theorem; the third equality 
uses the fact that p(yi, y2|^, di, d2) = p(yi|0, di)p(y2|^, d2), since the outputs are 
conditionally independent given the designs and parameters, and data from each 
experiment depend only on its own design given the parameters] and the last equality 
results from the marginalization of to obtain the differential entropy h. 

Similarly, the sum of the expected information gain due to each experiment indi- 
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(34) 



where we have also used the fact that p(yi|di) = p(yi|di,d2) to arrive at the last 
equality. Comparing Equations 33 and 34, the first two terms are identical. The 
remaining terms follow the identity 



/i(yi,y2|di,d2) < /i(yi|di,d2) + h(j2\di,d2) 



(35) 



and hence [/([di,d2]) < U{di) + f/(d2). Thus the total expected information gain 
from two simultaneous experiments can never be greater than the sum of the in- 
dividual expected information gains. Equality holds if and only if yi and y2 are 
conditionally independent given di and d2, or equivalently /(yi, y2|di, d2) = 0. In 
other words, equality holds when there is no mutual or "overlapping" information 
between the two sets of data. 



Appendix B: Governing Equations for Homogeneous Combustion 

This appendix describes governing equations for the chemical system analyzed in 
Section 6. Consider combustion in a spatially homogeneous (i.e., well- mixed) system 
at constant pressure. Such "zero-dimensional" systems are frequently used to model 
autoignition in shock tube experiments [82]. Convective and diffusive transport are 
neglected, leaving coupled ordinary differential equations that represent conservation 
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of individual species (36) and of energy (37). The state of the system is completely 
described by the species mass fractions Yi, . . . , Yn^ (where is the total number of 
species) and the temperature T. Governing equations are as follows: 

§ = -:&,, = l...n. (36) 

at p 

dT 1 

— = hnUnWn (37) 

dt pc, ^ 

with initial conditions 

F,(t = 0) =y,,o, T(t = 0)=To, (38) 

where coj [kmol ■ m~^ ■ s~^] is the molar production rate of the jth species, Wj [kg ■ kmol 
is the molecular weight of the jth species, p [kg ■ m^'^] is the mixture density, Cp 
J ■ ■ kg~^] is the mixture specific heat capacity under constant pressure, and /i„ 
J • kg~^] is the specific enthalpy of the nth. species. The molar production rate is 
defined in terms of elementary reaction rates as 



= ^ = E ^""'^3 - ^-.) ^/,- n - kr,m \{ C^-^ ) , (39) 

m=l \ 71=1 n=l 



where Cj [kmol • m^^] is the molar concentration of the jth species, is the total 
number of reactions, and and are the (dimensionless) stoichiometric coef- 
ficients on the reactant and product sides of the equation, respectively, for the nth 
species in the mth reaction. Molar concentrations Cj can be obtained from mass 
fractions Yj as follows: 

The forward and reverse rate constants of the mth reaction, denoted by kf^m and 
fcr,™ respectively, are assumed to have the modified Arrhenius form: 

= A^T^™exp(^) (41) 



exp 



where Ar 



RuT 

is the pre-exponential factor, 6^ 



(m3 ■ kmor^)" +^"=1''™ ■ s-i ■ 

is the exponent of the temperature dependence, -Ea,m [J ■ kmol~^] is the activa- 
tion energy, i?„ = 8314.472 [j ■ kmol^^ ■ K^^] is the universal gas constant, -ft'cm 

y... - J ^"=^ '^™n-^n=i "'^n) ^j^^ equllibrlum constant, and AG^^^^ [J ■ kmol" ^ 

is the change in Gibbs free energy at standard pressure and temperature T. Am, b^, 
and Ea^m are collectively called the kinetic parameters of reaction m. 
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The initial mass fractions Yj^Q are expressed compactly using the dimensionless 
equivalence ratio 0: 

/ _ (^Oa/^Ha) stoic _ i^Oi/^Hi) stoic ^'/lQ^ 

where the subscript "stoic" refers to the stoichiometric ratios and Xj is the molar 
fraction of the jth species, related to the mass fraction through 

In this paper, we use Xj in place of the Yj as the species state variables. We assume 
a perfect gas mixture, thus closing the system with the following equation of state: 

P 

RuTTTnUyn/Wj ^^^^ 

where p [Pa] is the (assumed constant) pressure. 
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1 



step 1: Define experimental goals 



Step 2: Develop objective function 



Step 3: Form numerical approximation 
to objective function 



Step 4: Perform stochastic optimization 
Experimental Design 



Perform experiment 
Collect data 



Experimentation 



Bayeslan parameter Inference 



Model Improvement 




Model Surrogate ^optional) 



Figure 1: A flowchart summarizing the key steps of the design-experimentation-model improvement 
cycle. 
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Algorithm 1: Modified dimension- adaptive sparse quadrature algorithm for 
non-intrusive spectral projection. 

i = (i,-- - ,1); 

O = 0; 
A = {i}; 

for r = 1 to rzcoef do 

Vr = Ai/^; 

Compute hr,i', 
end 

7] = hi- For example, = maXj. hrX, 
while T] > TOL do 

select i from A with the largest hi; 

A = A\{\y, 

= U {i}; 
ri = n-hi, 
for p = 1 to d do 
j = i + ep; 

if j — eg G O for all q — 1, ■ ■ ■ ,d then 

^ = ^U{j}; 
for r = 1 to ncoef do 
Sj. = Aj/r, 

— I Sf* ^ 

Compute /irj; 
end 

Compute ^j; 
ri = ri + hj] 
end 
end 
end 

Symbols: 

O — old index set; 
A — active index set; 

Vr — computed integral value Ylii<^o\jA ®p=i^ipfr for the rth coefficient; 
Ai/r — integral increment ®p=i^ipfr for the rth coefficient; 
hr^i — local error indicator for the rth coefficient; 

— total effect local error indicator; 
7] — global error estimate 'Yli^A^'^^ 
TOL — error tolerance; 
ep — ^pth unit vector; 
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d 



(a) Prior 0r^U{O,l) 




d d 

(b) Prior e - U{0, 0e) (c) Prior 9 - U{0e, 1) 

Figure 2: Estimated expected utility for the design of a single experiment with the simple nonlinear 
model under different priors, where Oe = ^ ^^2.8^ ~ 0-4373. 
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0.2 0.4 0.6 0.8 1 

(a) Prior e'-Z^(0, 1) 




(b) Prior e - W(0, 6'e) (c) Prior 9 - UiOe, 1) 

Figure 3: Estimated expected utility for the design of two experiments with the simple nonlinear 




0.2 0.4 0.6 0.8 1 

e 



Figure 4: Noiseless output of the simple nonlinear model G(6', d) as a function of the uncertain 
parameter 6, at two designs: d ~Q.2 and d = 1. 
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Reaction No. 



Rl 

R2 
R3 

R4 

R5 

R6 

R7 

R8 

R9 

RIO 

Rll 

R12 

R13 

R14 

R15 

R16 

R17 

R18 

R19 



H + O2 

O + H2 
H2 + OH 

OH + OH 
H2 + M 

+ + M 

+ H + M 
H + OH + M 

H + O2 + M 
HO2 + H 
HO2 + H 
HO2 + O 
HO2 + OH 

HO2 + HO2 
H2O2 + M 
H2O2 + H 
H2O2 + H 
H2O2 + O 

H2O2 + OH 



Elementary Reaction 



O + OH 

H + OH 
H2O + H 

O + H2O 
H + H + M 
O2 + M 
OH + M 
H2O + M 
HO2 + M 
H2 + O2 
OH + OH 
O2 + OH 
H2O + O2 
H2O2 + O2 
OH + OH + M 
H2O + OH 
HO2 + H2 
OH + HO2 
HO2 + H2O 



Table 1: 19-reaction hydrogen-oxygen mechanism [86]. Reactions involving the wildcard species M 
are three-body interactions, with different efficiencies for different species. Kinetic parameters of the 
boldface reactions are targeted for inference. 




2 3 
Time [s] 



X 10 



0.6 
0.5 

c 

o 

S 0.4 

£, 

■3 0.3 
0.2 
0.1 

0, 



r 



-"2 

-o 

-OH 
.H^O 

-H 

-H2O2 



2 3 
Time [s] 



4 5 

X 10"*' 



(a) Temperature 



(b) Species molar fractions 



Figure 5: Typical time-evolution of temperature and species molar fractions in H2-O2 ignition. 
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Observable 


Explanation 


on 


Ignition delay, defined as the time of peak enthalpy release rate 


To 


Characteristic time in which peak Xq occurs 


th 


Characteristic time in which peak occurs 


THO2 


Characteristic time in which peak Xho2 occurs 


TH2O2 


Characteristic time in which peak XH2O2 occurs 


dt \ T 


Peak value of enthalpy release rate 


Xo,T 


Peak value of Xq 


Xh,t 


Peak value of Xh 


Xh02,t 


Peak value of Xho2 


Xh202,t 


Peak value of XH2O2 



Table 2: Selected observables for the combustion problem. Note that dh/dt < when enthalpy is 
released or lost by the system. 




Figure 6: Illustration of the observables Tig„, th, and Xh.t hi the combustion problem. 



Parameter 


Lower Bound 


Upper Bound 


In(AiMO) 


-0.05 


0.05 


Ea,3 





2.7196 X 10^ 


To 


900 


1050 


<t> 


0.5 


1.2 



Table 3: Prior support of the uncertain kinetic parameters ^^{Ai/A^) and £"0,3, and ranges of the 
design variables Tq and (p. A uniform prior is assigned to the kinetic parameters. 



Design Point 


To 


A 


975 


0.5 


B 


925 


0.85 


C 


1025 


0.85 



Table 4: Experimental conditions at design points A, B, and C. 
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Temperature [K] Temperature [K] 



(a) Full ODE model (b) PC surrogate 

Figure 8: Estimated expected utility contours in the single-experiment combustion design problem, 
with design variables Tg and (j), using the full ODE model and the PC surrogate with p — 12 and 
'^quad = 10^. Inference problems are then solved at experimental conditions A, B, and C to validate 
the experimental design procedure. 
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(a) Design A full ODE model (b) Design A PC surrogate 




(e) Design C full ODE model (f) Design C PC surrogate 



Figure 9: Contours of posterior density of the kinetic parameters, showing the results of inference 
with data obtained at three different experimental conditions (designs A, B, and C). Left column: 
posteriors constructed using the full ODE model; right column: posteriors constructed via the PC 
surrogate with p — 12, nq^ad — 10^. 
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Temperature [K] 

(a) All observables (same as Figure 8(b)) 




Temperature [K] Temperature [K] 

(b) Characteristic time observables (c) Peak value observables 



Figure 10: Estimated expected utility contours in the single-experiment combustion design problem 
using the PC surrogate with p — 12 and Tiquad = 10^, but with different sets of observables. 



44 



Design Point 


To,i 


01 




02 




03 


2o,4 


04 


D 


970 


0.6 


975 


1.1 










E 


900 


0.5 


1050 


1.2 










F 


900 


1.2 


1050 


0.5 










"factorial" 


900 


0.5 


900 


1.2 


1050 


0.5 


1050 


1.2 



Table 5: Experimental conditions at design points D, E, F, and for a four-point factorial design. 
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950 1000 
Temperature [K] 

(a) SPSA riout = 1 



1050 




950 1000 
Temperature [K] 



1050 



(b) NMNS no 




950 1000 
Temperature [K] 

(c) SPSA riout = 10 



1050 




950 1000 
Temperature [K] 

(d) NMNS flout = 10 



1050 




950 1000 
Temperature [K] 

(e) SPSA Tiout 100 



1050 




950 1000 
Temperature [K] 



1050 



(f) NMNS no 



100 



Figure 11: Two-experiment combustion design problem: final outputs from 100 independent runs of 
stochastic optimization, using SPSA and NMNS with a limit of nnoisyObj = 10**, and with different 
numbers of outer Monte Carlo iterations, using the PC surrogate with p = 12, nquad — 10®. 
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No. (Algorithm-Relevant) Noisy Expected Utility Evals No. (Algorithm-Relevant) Noisy Expected Utility Evals 



(a) SPSA nout = 1 (b) NMNS riout - 1 




2000 4000 6000 8000 10000 
No. (Algorithm-Relevant) Noisy Expected Utility Evals 



S 2.5 




2000 4000 6000 8000 10000 
No. (Algorithm-Relevant) Noisy Expected Utility Evals 



(c) SPSA no 



10 



(d) NMNS nout = 10 




2000 4000 6000 8000 10000 



No. (Algorithm-Relevant) Noisy Expected Utility Evals 

(e) SPSA nout = 100 




2000 4000 6000 8000 10000 



No. (Algorithm-Relevant) Noisy Expected Utility Evals 

(f) NMNS uont = 100 



Figure 12: High-quality expected utility estimates, shown over the course of 10 independent stochas- 
tic optimization runs for the two-experiment combustion design. Each high-quality estimate is based 
on nout = '^in = 10^ samples. Note that the output for NMNS riout = 1 terminates earlier than the 
other cases because its simplices have already shrunk to sizes below machine precision. 
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1 1.5 2 2.5 

High-Quality Approx. to Final Expected Utility 




1.5 2 2.5 

High-Quality Approx. to Final Expected Utility 



(a) SPSA Hout = 1 



(b) NMNS no„t = 1 




High-Quality Approx. to Final Expected Utility 




(c) SPSA no 



10 



High-Quality Approx. to Final Expected Utility 

(d) NMNS nout = 10 




High-Quality Approx. to Final Expected Utility 

(e) SPSA riout = 100 




High-Quality Approx. to Final Expected Utility 



(f) NMNS no 



100 



Figure 13: High-quality expected utility estimates corresponding to the optimization outputs in 
Figure 11. Each high-quahty estimate is based on riout = 'T-in = lO'* samples. 
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(a) Design D 



(b) Design E 




(c) Design F (d) Design "factorial" 

Figure 14: Posterior densities resulting from inference at the experimental conditions listed in Ta- 
ble 5, using the PC surrogate with p = 12, nquad = 10®. 
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25 



20- 



15- 



10- 



1 1.5 2 2.5 

High-Quality Approx. to Final Expected Utility 

(c) Final expected utilities 

Figure 15: Two-experiment combustion design problem: final outputs from 100 independent runs of 
stochastic optimization. Optimization is performed using only a lower-order PC surrogate (p = 6 
and riquad = 10"'). To report performance, the expected utility estimates in subfigures (b) and (c) 
are computed using a higher-order PC surrogate (p = 12 and riquad = 10^) and riout = "-in = 10^ 
samples. Only results with NMNS and riout = 100 are shown. Compare to Figures 11(f). 12(f), and 
13(f). 
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(a) Design D (b) Design E 




(c) Design F (d) Design "factorial" 

Figure 16: Posterior densities resulting from inference at the experimental conditions listed in Ta- 
ble 5, using a lower-order PC surrogate with p = 6, Hquad = 10"*. Compare to Figure 14. 
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